93 research outputs found
3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks
Standard 3D convolution operations require much larger amounts of memory and
computation cost than 2D convolution operations. The fact has hindered the
development of deep neural nets in many 3D vision tasks. In this paper, we
investigate the possibility of applying depthwise separable convolutions in 3D
scenario and introduce the use of 3D depthwise convolution. A 3D depthwise
convolution splits a single standard 3D convolution into two separate steps,
which would drastically reduce the number of parameters in 3D convolutions with
more than one order of magnitude. We experiment with 3D depthwise convolution
on popular CNN architectures and also compare it with a similar structure
called pseudo-3D convolution. The results demonstrate that, with 3D depthwise
convolutions, 3D vision tasks like classification and reconstruction can be
carried out with more light-weighted neural networks while still delivering
comparable performances.Comment: Work in progres
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
In this paper, we introduce a novel unsupervised domain adaptation technique
for the task of 3D keypoint prediction from a single depth scan or image. Our
key idea is to utilize the fact that predictions from different views of the
same or similar objects should be consistent with each other. Such view
consistency can provide effective regularization for keypoint prediction on
unlabeled instances. In addition, we introduce a geometric alignment term to
regularize predictions in the target domain. The resulting loss function can be
effectively optimized via alternating minimization. We demonstrate the
effectiveness of our approach on real datasets and present experimental results
showing that our approach is superior to state-of-the-art general-purpose
domain adaptation techniques.Comment: ECCV 201
Discrete Point Flow Networks for Efficient Point Cloud Generation
Generative models have proven effective at modeling 3D shapes and their
statistical variations. In this paper we investigate their application to point
clouds, a 3D shape representation widely used in computer vision for which,
however, only few generative models have yet been proposed. We introduce a
latent variable model that builds on normalizing flows with affine coupling
layers to generate 3D point clouds of an arbitrary size given a latent shape
representation. To evaluate its benefits for shape modeling we apply this model
for generation, autoencoding, and single-view shape reconstruction tasks. We
improve over recent GAN-based models in terms of most metrics that assess
generation and autoencoding. Compared to recent work based on continuous flows,
our model offers a significant speedup in both training and inference times for
similar or better performance. For single-view shape reconstruction we also
obtain results on par with state-of-the-art voxel, point cloud, and mesh-based
methods.Comment: In ECCV'2
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
Learning Free-Form Deformations for 3D Object Reconstruction
Representing 3D shape in deep learning frameworks in an accurate, efficient
and compact manner still remains an open challenge. Most existing work
addresses this issue by employing voxel-based representations. While these
approaches benefit greatly from advances in computer vision by generalizing 2D
convolutions to the 3D setting, they also have several considerable drawbacks.
The computational complexity of voxel-encodings grows cubically with the
resolution thus limiting such representations to low-resolution 3D
reconstruction. In an attempt to solve this problem, point cloud
representations have been proposed. Although point clouds are more efficient
than voxel representations as they only cover surfaces rather than volumes,
they do not encode detailed geometric information about relationships between
points. In this paper we propose a method to learn free-form deformations (FFD)
for the task of 3D reconstruction from a single image. By learning to deform
points sampled from a high-quality mesh, our trained model can be used to
produce arbitrarily dense point clouds or meshes with fine-grained geometry. We
evaluate our proposed framework on both synthetic and real-world data and
achieve state-of-the-art results on point-cloud and volumetric metrics.
Additionally, we qualitatively demonstrate its applicability to label
transferring for 3D semantic segmentation.Comment: 16 pages, 7 figures, 3 table
Associative3D: Volumetric Reconstruction from Sparse Views
This paper studies the problem of 3D volumetric reconstruction from two views
of a scene with an unknown camera. While seemingly easy for humans, this
problem poses many challenges for computers since it requires simultaneously
reconstructing objects in the two views while also figuring out their
relationship. We propose a new approach that estimates reconstructions,
distributions over the camera/object and camera/camera transformations, as well
as an inter-view object affinity matrix. This information is then jointly
reasoned over to produce the most likely explanation of the scene. We train and
test our approach on a dataset of indoor scenes, and rigorously evaluate the
merits of our joint reasoning approach. Our experiments show that it is able to
recover reasonable scenes from sparse views, while the problem is still
challenging. Project site: https://jasonqsy.github.io/Associative3DComment: ECCV 202
End-to-end 6-DoF Object Pose Estimation through Differentiable Rasterization
Here we introduce an approximated differentiable renderer to refine a 6-DoF pose prediction using only 2D alignment information. To this end, a two-branched convolutional encoder network is employed to jointly estimate the object class and its 6-DoF pose in the scene. We then propose a new formulation of an approximated differentiable renderer to re-project the 3D object on the image according to its predicted pose; in this way the alignment error between the observed and the re-projected object silhouette can be measured. Since the renderer is differentiable, it is possible to back-propagate through it to correct the estimated pose at test time in an online learning fashion. Eventually we show how to leverage the classification branch to profitably re-project a representative model of the predicted class (i.e. a medoid) instead. Each object in the scene is processed independently and novel viewpoints in which both objects arrangement and mutual pose are preserved can be rendered.
Differentiable renderer code is available at:https://github.com/ndrplz/tensorflow-mesh-renderer
AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation
This paper targets on learning-based novel view synthesis from a single or
limited 2D images without the pose supervision. In the viewer-centered
coordinates, we construct an end-to-end trainable conditional variational
framework to disentangle the unsupervisely learned relative-pose/rotation and
implicit global 3D representation (shape, texture and the origin of
viewer-centered coordinates, etc.). The global appearance of the 3D object is
given by several appearance-describing images taken from any number of
viewpoints. Our spatial correlation module extracts a global 3D representation
from the appearance-describing images in a permutation invariant manner. Our
system can achieve implicitly 3D understanding without explicitly 3D
reconstruction. With an unsupervisely learned viewer-centered
relative-pose/rotation code, the decoder can hallucinate the novel view
continuously by sampling the relative-pose in a prior distribution. In various
applications, we demonstrate that our model can achieve comparable or even
better results than pose/3D model-supervised learning-based novel view
synthesis (NVS) methods with any number of input views.Comment: ECCV 202
Structure and evolutionary origin of Ca2+-dependent herring type II antifreeze protein
10.1371/journal.pone.0000548PLoS ONE26
- …